REMARKS 

Applicant received an Office Action dated 3/16/2006 from Examiner Eric 
Coleman for Ser. No. 10/681,404, filed 10/08/2003. 

Applicant reminds the Examiner that the present CIP (Ser. No. 
10/681,404) and its parent case (Ser. No.09/477,047 Filed 12/3 1/99) having the 
same claim 1, have been pending for over 6 years and have undergone 2 searches 
and 3 substantive non-final office actions. The previous Examiner Scott Collins 
left the Patent Office but indicated before he left that he was going to allow the 
case when it was released to him. He quit before it was released to him. 
Applicant understands that the Examiner needs to conduct his own examination. 
Applicant, however, wishes the Examiner to keep in mind the long length of time 
the case has been pending and the extensive examination that has already been 
conducted. 

Applicant has amended the Specification make it consistent with FIG. 2. 
The term RCL 42 in the Specification has been changed to make it consistent the 
term RCA 42 (i.e. Reconfigurable Combinational Logic Array) shown in FIG. 2. 
This change was made in a number of locations in the specification. Marked up 
copies of the affected paragraphs are in this amendment. Clean copies of the 
marked up paragraphs are attached as APPENDIX 1 

In response to Paragraphs 2 and 3 of the Examiner's Office Action, 
Applicant has amended Claim 6 to be dependant on Claim 5 rather than Claim 2. 
This was an error in Claim 6. Claim 6 contains a "counter" which has its 
antecedent basis in claim 5 and not claim 2. This "counter" is first claimed in 
Claim 5. In addition, Applicant has amended Claim 1 line 4 to add the words 
"said program"after the word "next". This provides antecedent basis from the 
"program instructions" cited in Claim 1 line 2. This makes it clear that the 
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Program Counter Indicator is the address of the next program instruction for the 
First Processor. 

Examiner Interview of March 30, 2006 

Applicant had an interview with the Examiner on March 30, 2006. 
Applicant and the Examiner discussed the use of the word "sharing" in lines 3 
and 6 of claim 1 . The claim language specifies that the First Processor and the 
Function Lookup Unit both share the Program Counter. Examiner Coleman felt 
that this was too broad. Applicant stated that it was the intent of Claim 1 that the 
Program Counter should be accessed in parallel by the First Processor and the 
Function Lookup Unit as shown in FIG. 1. Applicant agreed to amend Claim 1 to 
specify that the First Processor and the Function Lookup Unit access the Program 
Counter in parallel. Applicant amended Claim 1 at line 6 to add the language 
"said first processor and said functional lookup unit sharing parallel access to said 
program counter and registers". 

The Examiner and Applicant continued on to discuss the differences 
between Applicant's invention in claim 1 and the Yard (5,892,934) and Gilson 
(5,361,373) references. One of the primary differences between these two 
references and Applicant's invention is as follows. In the Yard reference (FIG. 2), 
microprocessor 12 needs to fetch (i.e. from memory cache 42) and decode (i.e. 
from Instruction decode Unit 46) a subroutine call instruction having a target 
address of a DSP function and then pass the target address to the DSP 14. The 
DSP then executes the routine stored in the target address. In contrast, 
Applicant's invention includes a traditional CPU (i.e. CCC 12 in FIG. 1) and a 
Function Lookup Unit (FLU) (14 in FIG. 1) containing a table of specialized 
accelerating functions each of which is identified by an address. An address from 
program counter (PC) 16 is accessed in parallel by both the conventional CCC 12 
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and the FLU 14 so that one of them can execute the required function. If there is 
not a match for the PC 16 address in the FLU table, the FLU and CCC will 
coordinate so that the CCC can perform the function. If there is a match in the 
FLU table, the FLU performs the function associated with the address. In 
summary, Applicant's claim 1 differs from Yard in that Yard needs to fetch 
instructions from instruction memory to perform a DSP function. Applicant's 
FLU does not need to fetch any instructions from the instruction memory to 
perform a similar specialized function in the FLU. The need to fetch instructions 
from instruction memory before executing them is called the "Von Neumann 
bottleneck". This is the busiest part of the computer and slows the computer. In 
short, Yard does not present a program counter indicator in parallel at both the its 
memory cache 42 and at its DSP in the same manner as Applicant's Claim 1. 

The above was all discussed during the interview. 

To help the Examiner understand the parallel nature of the CCC 12 and 
FLU 14 access to the program counter 16 the following comments are provided 
by Applicant. CCC 12 and FLU 14 take turns advancing the program counter and 
registers (specification Page 5, lines 3-5). They take turns executing parts of the 
program. They are never active at the same time (page 8, lines 7-9). They 
interact so that one of them executes the program while the other is resting. Flow 
charts in FIG 4 and 5 show the operation of CCC 12 and FLU 14, respectively, 
and how they interact so that one of them may execute the instruction properly. 
These flow charts dictate where in the interactive flow of operations of CCC 12 
and FLU 14 that they each access the program counter 16 during the instruction 
execution cycle. 

RESPONSES TO PARAGRAPHS 4-14 OF THE EXAMINER'S OFFICE 

ACTION 

PARAGRAPHS 4 and 5 
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Paragraph 5 of the Examiner's Office Action rejects Claims 1-4 and 7-9 
under 35 USC 103(a) as being unpatentable over yard (5,892,934) in view of 
Gilson (5,361,373). Yard is the Examiner's primary reference. Below is a 
detailed description of the differences of Yard and Applicant's Claims 1-9. 

DIFFERENCE BETWEEN YARD AND APPLICANT'S CLAIMS 1-9 

In the Yard reference (FIG. 2), microprocessor 12 needs to fetch (i.e. from 
memory cache 42) and decode (i.e. from Instruction decode Unit 46) a 
subroutine call instruction having a target address of a DSP function and then 
pass the target address to the DSP 14. This is done serially. The DSP then 
executes the routine stored in the target address. At the conclusion of the 
subroutine code sequence, a corresponding subroutine return instruction is 
fetched from cache 42 and executed in microprocessor 12. The subroutine return 
instruction uses the sequential address stored by the most recently executed call 
instruction as a target address, (col.3, line 25 - col. 4, line 4). 

The first disadvantage of the Yard approach is that an instruction fetch is 
needed from memory cache 42 to execute a DSP function. Therefore, it is subject 
to the "Von Neumann bottleneck" problem, (i.e. fetching instructions from the 
instruction cache which is the busiest part of the computer). A second 
disadvantage is that every DSP function performed by DSP 14 must be planned 
and programmed into the instruction memory cache 42 by the programmer. This 
includes both the subroutine call and subroutine return instruction fetches. In 
addition, the DSP address table must be loaded with explicit values (subroutine 
addresses) after the program has been compiled and linked. If the program in 
microprocessor 12 is changed, or if the DSP mechanism is to be used with other 
new general purpose programs, the DSP address table must be reloaded with 
different values. Manual programmer intervention to update the DSP tables is 
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necessary for such new general programs. This means recompiling every time a 
new application program is used. Yard provides no disclosure how to update such 
DSP programs automatically. 

In contrast, applicant's Claim 1, includes a traditional CPU (i.e. CCC 12 in 
FIG. l)(Claim 1, line 3) and a Function Lookup Unit ("FLU") (14 in FIG. 
l)(Claim 1, lines 6-27) containing a table of specialized accelerating functions 
each of which is identified by an address. An address from program counter (PC) 
16 (Claim 1, lines 3-5) is accessed by both the conventional CCC 12 and the FLU 
14 in parallel so that one of them can execute the required function. If there is not 
a match for the PC 16 address in the FLU table, the FLU and CCC will coordinate 
so that the CCC can perform the function. If there is a match in the FLU table, 
the FLU performs the function associated with the address. At the same time 
(Claims 5,6) the PC 16 address is accessed by the CCC and FLU, a counter in the 
FLU keeps count of the number of times the PC 16 address has been accessed. If 
the same PC 16 address has been accessed a number of times the FLU recognizes 
that this function is a "hot" function for which there should be a specialized 
accelerating function in the FLU ( Applicant's Claims 2,4,5 and 6). An exception 
routine will generate the function "on the fly" and put it in the FLU table for 
future use when the PC 16 address comes up again. The above sequence will be 
repeated as each PC 16 address is accessed by the FLU. The functions may also 
be generated by preloading (Claim 3) (specification - page 5, lines 17,18). The 
above is an overview description of applicant's invention. The Examiner is 
referred to applicant's previous response dated March 25, 2003 (Pages 4-6) for a 
detailed schematic description. 

Applicant's Claim 1 differs from Yard in that in Claim 1 the FLU does to 
not need to fetch instructions from instruction memory to perform a DSP function 
as does Yard. Therefore, it is not subject to the "Von Neumann bottleneck" 
problem. In addition, it does not need a programmer to program in every DSP 
function in microprocessor 12 and update the DSP table every time a new 
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application program is used. Thirdly, applicant's invention is independent of 
anything in the program. The FLU has its own table of functions and watches the 
application program as it progresses, by watching the PC 16 address. If it sees a 
PC 16 address that matches a function identifier in its function table then it 
performs the function. But just as importantly a counter keeps track of how many 
times a PC 16 address has been presented. If the PC 16 address appears 
frequently, the FLU considers it a "hot" address and requests that an exception 
routine generates an appropriate function for that address and put it in the FLU 
table so it may be used the next time the PC address comes up. All of this is done 
with no interaction with the program or with the CCC. The only interaction with 
the CCC is when the CCC and FLU coordinate to see which performs the 
function required by the PC 16 address or if the CCC perform an exception 
routine. In this way, no matter how the application changes, or WHATEVER 
application is run, there is an acceleration mechanism which can detect hot 
program regions, translate them to logic, and then insert these logical functions 
(by loading the right PC values into the FLU cache, NOT by changing the 
program in memory or in the INSTRUCTION cache) into the program flow, all 
unbeknownst to the authors or maintainers of the original programs, and without 
requiring any specific knowledge of ANY program - only the semantics of 
sequences of conventional CPU instructions. 

PARAGRAPH 6 

In paragraph 6, the Examiner cites a First Processor(12) (paragraph 6(a)) 
and a Function Lookup Unit (46) (paragraph 6(c)) as being the equivalent of 
those elements in Applicant's Claim 1. Referring the Examiner back to 
paragraph 5, this is an incorrect comparison because Applicant's First Processor 
and FLU access the Program Counter in parallel in Claim 1. The FLU does not 
need to fetch a program instruction to see the instruction address. In contrast, in 
Yard, the Examiner's claimed equivalent Function Lookup Unit (46) needs to 
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obtain the subroutine call instruction via an instruction fetch from the Instruction 
Cache 42 in Microprocessor (12). In Yard, the physical connections, signals 
passed, operation and results are all different than in Claim 1 . Paragraph 5 
describes all the disadvantages in the Yard structure. The Examiner's comment's 
in paragraph 6(c) lines 8-17 relating to tags and program control are also not 
appropriate because they relate to the different structure of Yard and Claim 1 . 



PARAGRAPH 7 

The Examiner makes comments about the control of the program by the 
DSP or the Microprocessor 12. The structure in Claim 1 and in Yard to control 
the execution of a program instruction by the general purpose processor or a 
special function unit are totally different. In claim 1, the first processor and the 
function lookup unit (FLU) both directly access the program counter (i.e. in 
parallel) and then make a decision which will execute the program instruction 
identified by the program counter. However, in Yard the structure is not the 
same. In Yard, the Instruction Cache 42 in Microprocessor 12 accesses the 
program counter but Instruction Decode Unit 46 and DSP 14 do not (FIG 2 and 
col. 5, lines 28,29). Because they don't access the program counter directly and 
instead must fetch an instruction first, they have all of the "Von Neumann 
bottleneck" and programming problems mentioned in the comments for 
PARAGRAPH 6. Yard does not read on Claim 1 because of these structural 
differences. 



PARAGRAPH 8 

In paragraph 8 the Examiner states " Yard did not expressly detail that the 

program counter was shared Also since the system coupled the DSP to the 

sane decoder as the other execution units one of ordinary skill would have been 
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motivated to share the program counter between the DSP and the first processor. 
Applicant does not understand this, since the clear teaching of Yard in FIG.2 
(also Col. 5, lines 28,29) is that neither the DSP 14 nor execution units 48 access 
the program counter. The instruction cache 42 holds the program instructions and 
hence accesses the program counter. The instruction cache 42 then couples its 
output to the instruction decoder 46 and then to DSP 14 and execution units 48. 
The instruction fetch before the signal is coupled to either the execution units 48 
or DSP 14 causes the "Von Neumann bottleneck" problem described previously. 
Applicant doesn't have this problem in his FLU because he does access the 
program counter directly. Yard teaches one skilled in the art away from being 
motivated to have the DSP 14 and execution units 48 access the program counter. 
In addition, Yard does not teach anything at all about overcoming the "Von 
Neumann bottleneck" problem. 

Applicant's claim 1 (lines 2-5) is very clear that first processor executes 
program instructions by accessing the program counter. It is equally clear in 
Claim 1, line 5 that the FLU accesses the program counter in parallel with the first 
processor. 

PARAGRAPHS 9 AND 10 

In paragraph 9 the Examiner substitutes the FPGA (12) of Gilson in for the 
DSP (14) of Yard. If FPGA (12) is substituted, it suffers from exactly the same 
problem as the DSP(14) in Yard as discussed in paragraphs 6,7 and 8. There still 
needs to be an instruction fetch from instruction cache 42 in Yard which then 
couples to the instruction decoder unit 46 which determines whether the DSP (or 
FPGA 12) is activated. Neither the DSP (14) of Yard nor the FPGA 12 of Gilson 
accesses the program counter directly as does the FLU in Applicants claim 1 . The 
combination of Yard and Gilson has the same "Von Neumann bottleneck" 
problem and reprogramming problems as detailed in paragraphs 7 and 8. 
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In paragraph 10 the Examiner contends that since both references were 
directed to increasing the efficiency of processors, one skilled in the art would be 
motivated to add the Gilson teachings of the reconfigurable array. Applicant 
takes issue with the Examiner's contention. Yard teaches away from this 
conclusion by the Examiner since it teaches that first the subroutine call 
instruction is fetched from the instruction cache and then a matching address is 
sought in the DSP table. Yard teaches one skilled in the art to fetch an instruction 
from instruction memory to identify the function to be performed and nothing 
about relieving the burden of fetching instructions. There is not one word in 
either the Yard or Gilson reference about addressing the issue of the fetching of 
instructions and its impact on "Von Neumann Bottleneck" problems. In addition, 
neither Yard nor Gilson address the issue of eliminating the need to reprogram 
the conventional processor to change opcodes (and other information) each time a 
new application program is used. If these issues were as easy to recognize as the 
Examiner contends Yard and Gilson would both have recognized them. There 
are a huge number of issues to be considered in designing a processor and just as 
many design options and tradeoffs to address those issues. How would a designer 
know to pick the issue of the Von Neumann bottleneck out of the huge number of 
design issues when Yard teaches away. The designer would not know unless he 
were aware of the inventor's teachings. The Examiner is using hindsight after 
seeing Applicant's teaching to say that the solution is obvious. 



PARAGRAPH 11 

Claim 2 is dependant on claim 1 and adds an exception routine to the 
structure of claim 1. In claim 2, the function lookup unit (FLU) provides an 
attention signal (claim 2, line 3) which activates an exception routine that 
provides a logic function for the block of program instructions identified by the 
program counter (claim 2, lines 2,3) ■ The exception routine provides the 
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function to the reconfigurable combinational array (claim 2, line 4) and a function 
indicator to the function indicator field Claim 2, (line 5) in the lookup cache 
(claim 1) 

The Examiner in Paragraph 11, lines 3 and 4 contends that Gilson provides 
"a corresponding function indicator (e.g. see col. 5, lines 1-67)". In Gilson, the 
host 40 (col. 7, lines 22-26) provides reconfigurational data to the FPGA 12 
where a new function is generated in Reconfigurable Instruction Execution unit 
16. Few details are given and it is unclear as to how functions are managed in the 
Reconfigurable Instruction Execution unit 16. There are few details given as to 
how the host 40 would go directly to the Reconfigurable Instruction Execution 
unit 16 if the desired function were already there. The only pathway clearly 
described in Gilson is providing reconfigurational data to FPGA 12 and then 
generating the desired function in Reconfigurable Instruction Execution unit 16. 
Applicant can't find a function indicator, as recited by the Examiner, in col.5, 
lines 1-67 or in the rest of the specification. 

In addition, in paragraph 1 1 the Examiner recites Yard as using the 
program counter to identify a block of instructions that the DSP performs. This is 
same contention that Applicant discredited in paragraphs 5-8. Yard fetches an 
instruction from Instruction Cache 42 with its program counter. The instruction 
is then coupled to decoder 46 and eventually to DSP 14. This has the "Von 
Neumann bottleneck" and programming problems cited previously. 

PARAGRAPH 12 

Applicant understands the Examiner's requirement to construe the 
language of Claim 3 as broadly as possible. However, even in its broadest 
interpretation Claim 3 does not read on the teachings of Gilson. In claim 3 logic 
functions (defined in claim 1, lines 19,20) take the place of a block of program 
instructions to be executed by the First Processor (Claim 1, line 2). In Gilson 
neither col. 6, lines 1-67 nor col. 7, lines 5-57 discuss preloading of the functions 
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to take the place of a block of program to be executed by the Host processor 40. 
What is disclosed in Gilson is that functions such as the RISC processor 
necessary to run the Reconfigurable Execution Unit 16 are preloaded into the 
FPGA 12. However, the functions to be generated by the Reconfigurable 
Execution Unit 16 (i.e. to take the place of a block of Host 40 instructions) are not 
preloaded. They are generated new each time the Host 40 sends reconfigurational 
data to FPGA 12 (col. 7, lines 5-37). The function are not saved for future use by 
Host 40. 

Claim 4 is dependant on claim 2. This claim requires that as the logic 
functions are created on the fly by the exception routine (claim 2) that they be 
stored in the RCA and a function indicator be stored in the lookup cache (claim 1, 
line 9). Gilson does not do any of this. As stated previously, Host 40 sends 
reconfiguration data to FPGA 12 and the Reconfigurable Execution Unit 16 
creates the function. The function is not saved for future Host 40 use and no 
function indicator is sent anywhere (col.7, lines 5-37). 

PARAGRAPH 13 

Claims 7, 8, 9 states that when the First Processor overwrites the 
contents of a program counter address, the Function Lookup Unit (i.e.FLU ) will 
look for a match to the program counter address in its lookup cache (Claim 1, 
lines 1 1,12 ) . If it finds a match it will deactivate that address in the lookup 
cache. Yard does not have a function lookup unit (FLU) which accesses the 
program counter. In Yard, Instruction Cache 42 is the only thing accesses that the 
program counter. In addition, Yard does not have the lookup cache (i.e. within 
the FLU) holding the addresses of logic functions that perform in place of blocks 
of program instructions (see PARAGRAPHS 5 and 6) . Yard cannot deactivate an 
address in the lookup cache in a function lookup unit (as in applicant's claims 7- 
9) because it does not have an equivalent lookup cache or function lookup unit. 
The Examiner's comments in paragraph 13 rely on Yard having such structure. 



21 



PARAGRAPHS 14 AND 15 

With respect to paragraphs 14-15 of the Examiner's office action, both 
claims 5 and 6 are dependent on claim 2 and are patentable for the same reasons. 
In addition, the Greenbaum reference does not specify that the number of times a 
nested loop is performed is counted. There is nothing specified in Greenbaum 
that indicates there is a need to count the number of times a nested loop is 
performed. This is in contrast to claims 5 and 6 where a threshold is specified 
before a new function is created. It appears that Greenbaum would be more like 
the operation of the stored functions in the FLU where once the function was 
created it is resident there until there is a request to perform it 

In view of the above changes and remarks, allowance of the claims is 
respectfully requested. Applicant has included a clean copy of the Claims with 
no amendment markings as APPENDIX 2. 

Should the examiner have any questions he is invited to call Applicant's 
attorney at the number given below. 



Respectfully submitted, 




Date : June 1,2006 



David G. Rasmussen 
Attorney for Applicant 
Reg. No. 26795 
Telephone: (508)435-9607 
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David G. Rasmussen 
Attorney at Law 
8 Hazel Rd. 
Hopkinton, MA 01748 
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